from ziqiang.celery import app
from bs4 import BeautifulSoup
from celery import Celery
import urllib2
celery = Celery('xinwen',broker='redis://localhost:6379/0')
def download(url,user_agent = 'wswp'):
    headers = { 'User-agent': user_agent}
    requset = urllib2.Request(url, headers = headers)
    html = urllib2.urlopen(requset).read()
    return html
@app.task
def getnews():
    f = open("wudaxinwen/templates/wudaxinwen/index.html", "w")
    f.truncate()
    f.close()
    for i in range(0,25):
        first_url = 'http://news.whu.edu.cn/wdyw.htm'
        html = download(first_url)
        soup = BeautifulSoup(html)
        page1 = soup.find_all(attrs={'id':'lineu5_%0d' % (i)})
        for url in page1:
            page2 = url.find_all(attrs={'class': 'gray'})
            for url1 in page2:
                href = url1.get('href')
                # print href
                newurl = 'http://news.whu.edu.cn/' + href
                myhtml = download(newurl)
                mysoup = BeautifulSoup(myhtml)
                content = mysoup.find(attrs={'class': 'news_content'}).find_all('p')
                for final in content:
                    f = open("wudaxinwen/templates/wudaxinwen/index.html", "a")
                    mynews = final.get_text().encode('utf-8')
                    f.write("<table>")
                    f.write("<tr>")
                    f.write("<td>")
                    f.write(mynews)
                    f.write("</td>")
                    f.write("</tr>")
                    f.write("</table>")
                    f.close()